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Abstract 

We introduce a new phylogenetic reconstruction algorithm which, 
unlike most previous rigorous inference techniques, does not rely on 
assumptions regarding the branch lengths or the depth of the tree. The 
algorithm returns a forest which is guaranteed to contain all edges that 
are: 1) sufficiently long and 2) sufficiently close to the leaves. How 
much of the true tree is recovered depends on the sequence length 
provided. The algorithm is distance-based and runs in polynomial 
time. 

1 Introduction 

In Evolutionary Biology, the speciation history of a family of related or- 
ganisms is generally represented graphically by a phylogeny, that is, a tree 
where the leaves are the observed (extant) species and the branchings in- 
dicate speciation events. Traditional approaches for reconstructing phy- 
logenies from homologous molecular sequences extracted from the observed 
species [Fel04, SS03] are typically computationally intractable [GF82, DS86, 
Day87, CT06, Roc06], statistically inconsistent [Fel78], or they require im- 
practical sequence lengths [Att99, LC06, SS99, SS02]. Nevertheless, over 
the past decade, much progress has been made in the design of efficient, 
fast-converging reconstruction techniques, starting with the seminal work 
of Erdos et al. [ESSW99a]. The algorithm in [ESSW99a], often dubbed 
the Short Quartet Method (SQM), is based on well-known distance-matrix 
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techniques, that is, it rehes on estimates of the evolutionary distance be- 
tween each pair of species (roughly the time elapsed since their most recent 
common ancestor). However, unlike other popular distance methods such 
as Neighbor- Joining [SN87], the key behind SQM's performance is that it 
discards long evolutionary distances, whose estimates from sequence com- 
parisons are known to be statistically unreliable. The algorithm works by 
first building subtrees of small diameter and, in a second stage, glueing the 
pieces back together. 

The Short Quartet Method is in fact guaranteed to return the correct 
topology from polynomial-length sequences in polynomial time with high 
probability. But this appealing theoretical performance comes at a price. 
The results of [ESSW99a] rely critically on biological assumptions which, 
although reasonable, are often not met in practice (see Section 1.3 for a 
formal statement): 

a) [Dense Sampling of Species] The observed species are "closely related." 
In particular, there are no exceptionally long branches in the phy- 
logeny. 

b) [Absence of Polytomies] The phylogeny is bifurcating. In fact, Erdos 
et al. assume that speciation events are sufficiently far apart to be 
easily distinguished. 

The point of a) is that it implies a natural bound on the depth of the tree 
which in turn ensures that enough information about the deep parts of the 
tree diffuses to the leaves. As for Assumption b) , it guarantees that a clear 
signal can be extracted from each branch of the phylogeny. It is obvious — at 
least intuitively — that assumptions such as a) and b) are necessary to secure 
the type of results Erdos et al. obtain: the guaranteed reconstruction of the 
full phylogeny. Hence, to improve over SQM and obtain strong guarantees 
under more general conditions, one has to relax this last requirement. 

In this paper, we design an algorithm which provides strong reconstruc- 
tion guarantees without Assumptions a) and b) . We show that our algorithm 
is guaranteed to recover a forest containing all edges that are "sufficiently 
long" and "sufficiently close" to the leaves. In fact, we allow a trade-off 
between the resolution of short branches and the depth of the reconstructed 
forest, a feature of potential practical interest. Also, we guarantee that our 
reconstructed forest has the desirable property of being disjoint (although 
the presence of short edges leads us to allow deep intersections of very short 
branches between the subtrees). Moreover, our algorithm does not require 
the knowledge of a priori bounds on branch lengths or tree depth. Finally 
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if Assumptions a) and b) are satisfied, we recover the whole phylogeny and 
provide an alternative to the algorithm of Erdos ct al. 

Precise statements are given in Section 1.2. For a full comparison to 
related work see also Section 1.3. 

1.1 What can we hope to reconstruct? 

Well-known identifiability results [Cha96] guarantee that phylogenies — or 
at least their idealized stochastic models — can be fully reconstructed given 
enough data at the leaves. However, molecular data gathered from current 
species are in essence limited, which begs the question: How much of the 
tree can we really hope to reconstruct? We pointed out above two important 
sources of difficulties: short branches produce a weak signal that may be 
hard to detect; similarly, untangling the deep parts of the tree presents 
challenges that are well documented (see, e.g., [PL98, CDvM+06]). Note 
that these issues are fundamentally "information-theoretic" and affect all 
reconstruction methods. 

To avoid these difficulties, most rigorous methods impose restrictions 
on the length of the branches and/or the depth of the tree, which may be 
unsatisfactory from a practical perspective. On the other hand, methods 
commonly used in practice, such as likelihood and bayesian methods, typ- 
ically produce several candidate trees as well as confidence estimates. But 
theoretical guarantees on the quality of such outputs are hard to obtain. 

Here, we seek to give strong reconstruction guarantees without any as- 
sumption on the true phylogeny. Our goal is to recover, for any given amount 
of data, as much of the tree as can rigorously be reconstructed with high 
confidence. Since the full phylogeny may not always be recoverable, we are 
led to a more flexible solution concept: we output a contracted subforest of 
the true phylogeny. That is, we output a forest containing all branches that 
are "sufficiently long" and "sufficiently recent" ; note that "sufficiently" here 
is determined (information-theoretically) by the size of the data (usually in 
terms of sequence length). In the remainder of this section we formalize this 
solution concept. 

The input. Formally, a phylogeny is a weighted, multi fur eating tree on 
a set of leaves L, which we identify with the labels [n] = {1, . . . ,n}. We 
denote a phylogeny by T = {V, E; L, X). Here V and E are respectively the 
vertex and edge set of the tree, and X : E ^ {0, +oo) assigns a weight to 
each edge (the branch length). We assume that all internal vertices V — L 
have degree at least 3. 
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Figure 1: The effect of distance distortion from the perspective of a leaf. 
On the left hand side is the true phylogeny. On the right hand side, only 
distances within a certain radius represent accurately the metric underlying 
the phylogeny. 

A phylogeny is naturally equipped with a so-called additive metric on 
the leaves d:LxL^(0,-|-oo) defined as follows 

yu,v & L, d{u,v) = Ag, 

where Pt{u, v) is the set of edges on the path between u and v in T. Often 
d{u, v) is referred to as the "evolutionary distance" between species u and 
V. Since under the assumptions above there is a one-to-one correspondence 
between d and A, we write cither T = (V, E; L, d) or T = (V, E; L, A). We 
also sometimes use the natural extension of d to the internal vertices of T. 
We denote by T the set of all phylogenies on any number of leaves. 

It is well-known that given an additive metric d one can reconstruct 
the corresponding phylogeny T. However, in practice, one can only derive 
an estimate d of d, the accuracy of which depends on the amount of data 
used. (This estimate is known in the literature as the "distance matrix" .) 
Our goal in this paper is to reconstruct a phylogeny — or as much of it as 
possible — from this "distorted" version of its additive metric. A typical 
property of distance estimates is that estimates of long distances are unreli- 
able. The following definition formalizes this phenomenon. See Figure 1 for 
an illustration. 

Definition 1 (Distorted Metric [Mos07, KZZ03]) LetT = {V,E;L,d) 
be a phylogeny and let r, M > 0. We say that d : L x L ^ (0,-|-oo] is a 
(r, M) -distorted metric for T or a (r, M) -distortion of d if: 
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1 . [Symmetry] For all u,v E L, d is symmetric, that is, 

d{u, v) = d{v, u); 

2. [Distortion] d is accurate on "short" distances, that is, for allu,v G L, 
if either d{u, v) < M + t or d{u, v) < M + t then 



d{u, v) — d{u, v) 



< T. 



In phylogenetic reconstruction, a distorted metric is naturally derived from 
samples of a Markov model on a tree — a common model of DNA sequence 
evolution used in Biology. (Sec Appendix A for details.) In the remainder of 
this paper, we assume that we are given a (r, M)-distortion d of an additive 
metric d and we seek to recover the underlying phylogeny T. 



Contraction and pruning. Given only a (r, M)-distorted metric, it is 
clear that the best we can hope for in general is to reconstruct a forest 
containing those edges of T that are "sufficiently close" to the leaves. In- 
deed, note that two phylogenies that are identical up to depth M from the 
leaves, but are otherwise different, can give rise to the same distorted metric. 
Moreover, since we do not assume that edges are longer than the accuracy 
r, some edges may be too short to be reconstructed and, as we mentioned 
before, we allow ourselves to instead contract them. Hence, we are led to 
consider subforcsts of the true phylogeny where deep edges are pruned and 
short edges arc contracted. 

To formalize this idea we need a few definitions. Let us first describe 
what we mean by a subforest of a phylogeny T = {V, E; L, d). Given a set 
of vertices V' C V, the subtree of T restricted to V' is the tree obtained 
1) by keeping only nodes and edges on paths between vertices in V' and 
then 2) by contracting all paths composed of vertices of degree 2, except 
the nodes in V'. See Figure 2 for an example. We denote this tree by r|y/. 
We typically take V' C L. A subforest of T is defined to be a collection of 
restricted subtrees of T. 

We also need a notion of depth. Given an edge e & E, the chord depth 
of e is the length of the shortest path among all paths crossing e between 
two leaves. That is, 

Ac{e) = m.iD.{d{u,v) : u,v E L,e E:Pt{u,v)} . 

We define the chord depth of a tree T to be the maximum chord depth in T 

Ac(r) = max{Ac(e) : e e E} . 
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Figure 2: Restricting the top tree to its white nodes. 



Definition 2 (Contracted Subforest) LetT = {V,E;L,d) be aphylogeny. 
Fix M > 0. Let {Li, . . . he the natural partition of the leaf set L ob- 
tained by removing all edges e £ E such that Ac(e) > M. We define the 
M-pruned subforest of T to be the forest Fm{T) = {Vm,Em) consisting 
of the trees {Tj/,^, . . . ,T\l^}. The metric d is extended as follows for all 
u,v E L, 



We also denote by Xm the edge lengths of Fm{T). 

Now, given also r > 0, the r-contracted M-pruned subforest ofT is the 
forest FT^MiT) = iyT,M-,ET-^M) obtained from Fm{T) by contracting edges 
e G Em of weight Xuie) < r. 

Path-disjointness. We require that the trees of our reconstructed forest 
are "non-intersecting". This is a natural condition to impose in order to 
obtain a meaningful reconstruction: we want to avoid as much as possible 
that the same branches appear in many subtrees. In fact, we can only 
guarantee approximate disjointness as defined below. 

We first need a notion of depth for vertices. For a phylogeny T = 
{V, E; L, d) and a vertex x £ V, the vertex depth of x is the length of the 
shortest path between x and the set of leaves. That is, 




d{u,v), if u,v are in the same subtree of Fm{T) 



o.w. 



Ay{x) = min. {d{u, x) : u E L} . 
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Given two leaves u, v of T, we denote by Pt('w, v) the set of vertices on the 
path between u and v in T. 

We say that two trees are (r, M)-path disjoint if they are "almost dis- 
joint" in the sense that they only share edges (if any) that are "deep" (end- 
points have vertex depth at least M/2) and "short" (length at most r). 
More formally: 

Definition 3 (Approximate Path-Disjointness) Let T = (V, E; L, d) 

be a phylogeny. Two subtrees Ti, T2 ofT restricted respectively to Li,L2 C L 
are (r, M)-path-disjoint if Lir\L2 = $ and for all pairs of leaves ui,vi G Li 
and U2,V2 G L2 such that 

Pt{ui,vi) n Pt{u2, V2) ^ 0, 

we have: 

min{Av(x) : x G Pt(mi, ^^i) n Pt(«2, ^^2)} > ^M, 
and, if further Pt{ui,vi) PI Pt{u2, V2) 7^ 0, 

max{Ae : e e Pt{ui,vi) nPT{u2,V2)} < t. 

More generally, a collection of restricted subtrees Ti, . . . ,Tq ofT are (r, M)- 
path-disjoint if they are pairwise (r, M)-path-disjoint. In the case t = 0, we 
simply say that the subtrees are path-disjoint. 

1.2 Main result and coroUciries 

Main result. Our main result is an algorithm which, given a (r, M)- 
distorted metric, reconstructs a contracted sub forest (of the true phylogeny) 
whose trees are approximately path-disjoint. Typically, M is much larger 
than T. In that case, we reconstruct a subforest of T with chord depth 
~ |M which includes all edges of length at least 4r. The reconstructed 
subtrees may "overlap" on edges of length at most 2r at vertex depth ^ |M. 
In Section 4, we show that these parameters are essentially optimal. The 
algorithm runs in polynomial time. 
More precisely, we show: 

Theorem 1 (Main Result) Letr and M be monotone functions ofn with 
M > 3r. Let m > 3t be such that 

m < ^[M - St], 
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for all n. Then, there is a polynomial-time algorithm A such that, for all 
phylogenies T = {V,E;L,d) in T with \L\ = n and all {t, M) -distortions d 
of d, A applied to d satisfies the following: 

1. [Approximate Path Disjointness] A returns a (2r, m—^T) -path-disjoint 
subforest F of T; 

2. [Depth Guarantee] The forest F is a refinement of FiT^m-T{T); 
We give below a few important special cases of Theorem 1 . 

Tree case. When the amount of data is sufficient to produce a distorted 
metric with M = 0(Ac(r)), we get a single component, that is, the full tree 
(up to those edges that are contracted). 

Corollary 1 (Tree Case) Let t > and M > 2Ac(r) + 5r. Then, choos- 
ing m > Ac(T) + T guarantees that the reconstructed forest is composed of 
only a tree. 

In the case of "dense" phylogenies, M = r2(logn) is sufficient to reconstruct 
the full tree. 

Definition 4 (Dense Phylogenies (see e.g. [ESSW99a])) We say that 
a collection of phylogenies T' is dense if there is aO < g < +oo (independent 
of n) such that for all T = (V, E; L, A) G T' we have 

Ve eE, Ae < g. (1) 

We denote by Tg the set of phylogenies satisfying (1). 

Corollary 2 (Dense Case) In the case of dense phylogenies, M = J7(logn) 
suffices to guarantee the reconstruction of the full tree, up to contracted 
edges. 

Absolute VEiriant. All rigorous algorithms prior to our work (see Sec- 
tion 1.3) require knowledge of either the tree depth or bounds on the edge 
lengths to give strong reconstruction guarantees. This is not satisfactory 
from a practical point of view. Here given only the sequence length we pro- 
vide explicit guarantees. The following result assumes that the distorted 
metric is derived from a Markov model on a tree. (See Appendix A for 
details.) 
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Corollary 3 (Absolute Variant) Given a number of samples k = fi(logn) 
from a Markov model on a tree and a chosen level of contraction e > 
(small), one can choose T,M,m so that A is guaranteed to return a (con- 
tracted) subforest of T containing F^^m'{T) with probability 1 — o(l), where 
M' = fig (log k — log log n). 

Complete resolution. Finally we remark that, if we further assume that 
all branch lengths are bounded from below by a constant, then by choosing 
r accordingly a non-contracted forest is returned. In particular, we can 
recover the results of [ESSW99a]. 

1.3 Related work 

Under a Markov model of evolution, the Short Quartet Method (SQM) of 
Erdos et al. [ESSW99a] is guaranteed to recover the full phylogeny as long 
as the number of samples k satisfies 

fc>c/-V'^^=(^)logn, 

for constants c, c' > 0, where / and g are respectively lower and upper 
bounds on the branch lengths possibly depending on n. For instance, if / 
and g arc constants the sequence length needed for complete reconstruction 
depends polynomially in the number of species. 

Mossel [Mos07] developed a framework that allows the reconstruction of 
a well-behaved forest when sequences are too short to guarantee a complete 
reconstruction. More precisely, edges which are too deep (in the sense of 
appearing only on paths between species whose distances are not accurately 
known) are pruned from the final reconstruction. At a high level, Mossel's 
Distorted Metric Method (DMM) (implicit in [Mos07]), works in a fashion 
similar to SQM — except for a pre-processing phase that clusters together 
sufficiently related species. However, for DMM to work, lower bounds on 
the branch lengths are required and, moreover, these must be known by the 
algorithm. Following up on [Mos07] , Daskalakis et al. [DHJ"'"06] gave a vari- 
ant of DMM that runs without knowledge of a priori bounds on the branch 
lengths or the tree depth — making their variant somewhat more practical. 
However, like DMM, the algorithm in [DHJ+06] does not deal properly with 
short edges: any part of the tree containing a short edge cannot be recon- 
structed by the algorithm (even though there may be adjacent edges that 
are in fact reconstructible). Therefore, in the presence of short edges no 
guarantee can be given about the depth of the reconstructed forest. 
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Figure 3: Comparison of methods. 



Recently Gronau et al. [GMS08] ehminated the need for a lower bound on 
the branch length by contracting edges whose length is below a user-defined 
threshold. Their solution uses a Directional Oracle (DO) which closes in on 
the location of a leaf to be added and, in the process, contracts regions that 
do not provide a reliable directional signal. Although the DO algorithm 
does not use an explicit bound on the depth of the tree, their reconstruction 
guarantee requires such a bound, similarly to [ESSW99a]. In particular, 
Gronau ct al. leave open the question of giving a forest-building version of 
their algorithm. Moreover, the sequence length in [GMS08] depends expo- 
nentially on what the authors call the e-diameter of the tree — essentially, 
the maximum diameter of the contracted regions. It is natural to conjecture 
that an optimal result should not depend on this parameter. 

For further related work on efficient phylogeny reconstruction, see also [ESSW99b, 
HNW99, CKOl, Csu02, KZZ03, MR06, DMR06]. 

1.4 Discussion of the results 

In Table 3 we summarize the current status as discussed in the previous 
sections. 

As the table emphasizes, our overarching goal is to design an algorithm 
with good reconstruction guarantees in the presence of both short and 
deep edges, whose execution does not rely on a priori bounds on branch 
lengths. Unfortunately, given the combinatorial complexity of Mossel's 
forest-building algorithm, it is not straightforward to provide the extra flex- 
ibility of edge contraction in this framework. The novelty in our work is 
twofold: 

• Solution Concept: A basic complication is that, in some sense, con- 
traction and pruning interfere with each other. Indeed, the presence 
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of unresolved branches at the boundary of partially reconstructed sub- 
trees creates the possibility of deep "undetectable" intersections. This 
pitfall seems to be unavoidable. One of our main contributions is to in- 
troduce the notion of approximate disjointness, which allows short but 
deep intersections between subtrees of the reconstructed forest. This 
suitable solution concept leads to a quite simple algorithm with rea- 
sonable guarantees. Moreover, the flexibility in our definition allows 
us to recover all previously known results as special cases. 

• Algorithmic Technique: A natural approach to forest building used 
in [Mos07, DHJ+06] proceeds along the following three steps: 

1. first, leaves are grouped into clusters for which all pairwise dis- 
tances are accurately known (the small clusters) ; 

2. by definition, the local topologies on the small clusters can be 
trivially reconstructed [BunTl]; 

3. finally, the local topologies that intersect in the true tree are 
"glued" together to get a forest (the resulting forest partitions 
the leaves into large clusters). 

This last step involves non-trivial combinatorial considerations. We 
have found that further allowing contracted edges makes this process 
somewhat unmanageable. Instead we use a different approach relying 
on simple metric arguments. In particular, we directly partition the 
leaves into large clusters, whose underlying subtrees are approximately 
disjoint, and provide a new straightforward method to reconstruct 
these subtrees. 

In addition, we obtain as special cases the results discussed in Section 1.3. 

In particular, if there are no short edges, we recover the results of [Mos07] 
and [DHJ^"06], where a path-disjoint forest is returned (by taking r equal 
to half the lower bound on the branch lengths in Theorem 1). If further- 
more there is an upper bound on the branch lengths, we recover the results 
of [ESSW99a] (Corollary 2). Finally, if we keep the upper bound on the edge 
lengths, but drop the lower bound, we recover the results of [GMS08] (Corol- 
lary 1). In fact, we eliminate the dependence on the e-diameter. Further, 

^ After the results of the current paper were posted on the arXiv, we were informed 
by S. Moran that, in parallel to our work, the authors of [GMS08] have improved on 
their previous results: the dependence on the e-diameter has been removed. A preprint 
of this work is currently available on the authors' website. Note however that this new, 
independent work docs not deal with deep edges and still makes assumptions similar 
to [ESSW99a] restricting the depth of the generating tree. 
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unlike [GMS08], we allow an arbitrary number of states, an extension — it 
should be noted — ^that follows easily from [ESSW99b] and [Mos07]. 

1.5 Organization 

The rest of the paper is organized as follows. The algorithm is detailed 
in Section 2. The proof of our main theorem follows in Section 3. We 
conclude with a lower bound in Section 4 and a discussion of the running 
time in Section 5. Also, for completeness, in Appendix A we describe the 
probabilistic motivation behind the distorted metric definition. 

The results in this paper were announced without proof in [DMR09]. 
Also, the counter-example in Section 4 did not appear in [DMR09]. 

2 Algorithm 

The outline of the algorithm follows. There are three main phases, which 
are explained in detail after the outline. The input to the algorithm is a 
(r, M)-distorted metric c? on n leaves. In particular, we assume that the 
values T and M are known to the algorithm (but see also Corollary 3). Let 
m be as in Theorem 1. We denote the true tree by T = {V, E; L, d). The 
details of the subroutines Mini Contractor and Extender are detailed 
in Figures 5 and 7 (see also their high level description below). 

• Pre-Processing: Leaf Clustering. Build the distorted clustering 
graph Hm = (Vm, Em) where Vm = [n] and (n, v) € Em <^=^ d{u, v) < 
m; compute the connected components {hm = {vm ,em)}i=i of Hm', 

• Main Loop. For all components i = 1,. . . ,q: 

— For all pairs of leaves u,v ^ Vm such that {u, v) G Em- 

* Mini Reconstruction. Compute 

{tpj{u,v)}f^f^ := Mini Contractor(^(;); tx, v); 

* Bip£U"tition Extension. Compute 

{^j{u,v)y/^f^ := EXTENDER{UJi,{tPj{u,v)Y/^{'''^;u,v); 

— Deduce the tree T^') from ^)}^=i''^ 

• Output. Return the resulting forest F. 
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Figure 4: Illustration of routine Mini Contractor. See Figure 5 for no- 
tation. 



Pre-processing: Leaf clustering. As mentioned before, given a (r, M)- 

distortion wc cannot hope to reconstruct edges that arc too deep inside the 
tree. This results in the reconstruction of a forest. Therefore, the first phase 
of the algorithm is to determine the "support" of this forest. We proceed as 
follows. Consider the following graph on L. 

Definition 5 (Clustering Graph) Let M' e [r,< M-r]. The distorted 
clustering graph with parameter M' , denoted Hm' = {Vm',Em'), is the 
following graph: the vertices Vm> o-fe the leaves L of T; two leaves u,v E L 
are connected by an edge e = (ii, v) G Em' if 

d{u,v)<M'. (2) 

Note that this is an undirected graph because d is symmetric. Similarly, we 
define the clustering graph with parameter M' , Hm> = {Vm',Em'), where 
we use d instead d in (2). 

The first phase of the algorithm consists in building the graph Hm from 
d. We then compute the connected components of Hm which we denote 
{hrn}l=i- In the next two phases, we build a tree on each of these compo- 
nents. 

Building the components I: Mini-reconstruction problem. Fix a 

component km of Hm- In this and the next phase, we seek to reconstruct 
a contracted tree on hm ■ Denote by T^*^ the true tree T restricted to the 
leaves in hm ■ First, we find all edges of T(*) that are "sufficiently long" 
and lie on "sufficiently short" paths. More precisely, we consider all pairs 
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Algorithm Mini Contractor 
Input: Component km', Leaves u,v; 
Output: Bipartitions 

• Ball. Let 

B{u,v) := evl^^ : w) V w) < m| ; 

• Intersection Points. For all w G B{u,v), estimate the point of 
intersection between u,v,w (distance from u), that is, 

$^ := ^ (d{u, v) + d{u, w) - d{v, w)j ; 

• Long Edges. Set S := B{u,v) — {u}, a;_i = u, j := 0, Cq = {u}; 

- Until 5 = 0: 

* Let xo = argmin{$tt, : w e S} (break ties arbitrarily); 

* If — ^x-i > 2t, create a new edge by setting 
'tjjj+i{u,v) := {B{u,v) - S,S} and let Cj+i := {xq}, j := 

* Else, set Cj := Cj U {a;o}; 

* Set S := S — {xo}, X-i := xq; 

• Output. Return the bipartitions {ijjj{u,v)Yj'l!^j^^ (where r{u,v) is the 
number of bipartitions generated in the previous step) . 

Figure 5: Algorithm Mini Contractor. See Figure 4 for illustration. 
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" (i) 

of leaves u,v connected by an edge in hm , that is, leaves within distorted 

distance m. For each such pair u, v, the mAni reconstruction problem consists 
in finding all edges e in Pri^(i)(u,v) that have length larger than Ae > 4t. To 
do this using the distortion d, we first consider a ball B{u,v) of all nodes 
within distorted distance M of n and v, that is, 

B{u,v) = eh^^ : d{u,w)y d{v,w) < , 

where a V 6 is the maximum of a and b. The point of using this ball is 
that we can then guarantee that each edge in P2'{i) {u, v) is "witnessed" 
by a quartet (i.e., a 4-tuple of leaves) in B{u,v) in the following sense: 
let {xi,X2) be an edge in Prp{i){u,v) and let {xj,yj), j = 1,2, be an edge 
adjacent to xj that is not in Prp{i) (u, v); for j = 1,2 let Lx]^yj be the leaves 
reachable from yj using paths not including Xj\ then we will show that 

Lx]^yj ni?(u, i;) 7^ for _7 = 1, 2. In other words, there is enough information 
in B(u,v) to reconstruct all edges in 'Prp(i)(u,v) — at least those that are 
"sufficiently long." This phase is detailed in Figure 5. An illustration is 
given in Figure 4. 

Building the components II: Extending the bipartitions. The pre- 
vious step reconstructs "sufficiently long" edges on balls of the form B(u, v). 
By reconstructing an edge on B{u,v), we mean finding the bipartition of 
B{u, v) to which the edge corresponds. More precisely: 

Definition 6 (Bipartitions) Let T = (F, E) be a multifurcating tree with 
no vertex of degree 2. Each edge e in T induces a bipartition of the leaves 

L of T as follows: if one removes the edge e from T, then one is left with 
two connected components; take the partition of the leaves corresponding to 
those components. Denote by brie) the bipartition of e on T. It is easy 
to see that given the bipartitions {6r(e)}eeE one can reconstruct the tree 
T efficiently [Bun71, MeaSl, BD86J. (Proceed by sequentially "splitting" 
clusters.) 

The goal of the second phase in the main loop of our reconstruction algo- 
rithm is to extend the bipartitions previously built from B(u,v) to the full 
component km. To perform this task, we use the following observation: 
suppose we want to deduce the bipartition corresponding to edge e; since 
the radius of the ball B{u, v) is much larger than m, we can make sure that 
a path from a leaf in hm that is outside B{u, v) to a leaf on the other side 
of the bipartition brie) is "long." Therefore, we can easily determine what 
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Figure 6: Illustration of routine Extender. See also Figure 7. 



side of the partition each leaf in hm lies on. For details, see Figure 7. An 
illustration is given in Figure 6. 

3 Analysis of the Algorithm 

We assume throughout that d is a (r, M)-distortion of d and moreover that 
m satisfies the conditions of Theorem 1. 

3.1 Leaf clustering: Determining the support of the forest 

Recall the notation of Definition 5. 

Proposition 1 (Leaf Clustering) Let r < M' < M — t. Then 

Proof: This follows immediately from the definition of d. Indeed, if d{u, v) < 
M' -T then 

d{u, v) < d{u, v)+T <{M' -t)+t < M'. 
Similarly, if d(u, v) < M' then 

d{u, v) < d{u, v) + T < M' + T. 
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Algorithm Extender 

Input: Component hm', Bipartitions {xpj{u,v)}^^{"^ ; Leaves u,v; 
Output: Bipartitions {'ipj{u,v)Yj^i^^ ; 

• For j = 1,. . . , r{u, v) (unless r{u, v) =0): 

— Initialization. Denote by ■0^^"-' (m, v) the vertex set containing u 
in the bipartition ipj^u^v), and similarly for v. Initialize the ex- 
tended partition 'ipj"\u,v) := ipj^^u^v), -ipj^^UjV) :~ 'il)j^\u,v); 

— Modified Graph. Let K be hm where all edges between 
V'j"^ {u, v) and ipj"^ {u, v) have been removed; 

— Extension. For all w e Vm — (V'j"^ (w, v) U i/'j"'' (u, v)), add w to 
the side of the partition it is connected to in K (by Proposition 6, 
each w as above is connected to exactly one side); 

• Return the bipartitions {4>j{u,v)Yj^{"\ 

Figure 7: Algorithm Extender. See Figure 6 for an illustration. 



3.2 Mini-reconstruction: Finding long edges on short paths 

Consider a component hl^ = (?)^\e^^) of Denote by tW = (F«,eW) 

(i) 

the tree T restricted to the leaves in Vm , that is, 

(i) 

• Keep only those edges of T that are on paths between leaves in Vm ', 

• Glue together edges adjacent to vertices of degree 2; 

• Equip T^*) with the metric d restricted to VmXVm and denote {Ae ^}ggg(i) 
the corresponding weights. 

Proposition 2 (Chord Depth of T^^) The chord depth of T^^ is less 
than m + T. 

Proof: We argue by contradiction. Let e be an edge in T^*). Suppose that 
the chord depth of e in T*^*-* is > m + r. Consider the bipartition ^(^)} 
defined by e in T^*). Then it follows that for all ui G ip^^^ and U2 G '4'^'^\ we 
have 

d{ui,U2) > d{ui,U2) — T >m, 
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" (i) 

SO that hm cannot be connected, a contradiction. ■ 

Let e' = {u',v') be an edge in a tree T' with leaf set L' . We denote by 
L'u'^v' leaves of T' that can be reached from v' without going through 
u'. Recall that for two leaves u',v' of T', wc denote by Pt'{u',v') the set of 
vertices on the path between u' and v' in T'. Recall also that 



B 



{u,v) = e v^^ : d{u,w)V d{v,w) < , 



for U,V ^ Vm ■ 



Proposition 3 (Witnesses in B{u,v)) Assume that 2m + 3r < M. Let 

{u,v) G Cm- Let {x,y) be an edge of T^'^^ such that x G Prp(i){u,v) but 
y ^ P-rii) ■ Then we have 

Biu,v)nL^l^^il>, 
where L^*) is the set of leaves of TW . 

Proof: By Proposition 2, there are leaves xo,yo in L^^^ such that {x,y) G 
Prpii) {xq, yo) and d{xo, yo) < m + r. Assume without loss of generality that 

(i) 

yo € LxUy By assumption, 

d{u, v) < d{u, v) + T < m + T. 

Therefore, 

diu, yo) < d{u, x) + d{x, yo) < d{u, v) + d{xo, yo) < 2m + 2r, 

from which we get d{u, yo) < 2m + 3r < M. The same inequality holds for 
d{v,yo). ■ 

Fix a pair of leaves u, v with (n, v) G Cm • For w G B{u, v), let 

■■= ^ (^ci('U, I') + d{u, w) - d{v, w)^ , 
and ^ 

Note that is the distance between u and the intersection point of {u, v,w}. 
Let {Cj}^^Q^'^ and {V'j(w, be as in Figure 5. We write w ^ w' if 
w,w' £ Cj for some j. Similarly, we write w ^ w' (respectively w < w') if 
w G Cj and w' G Cj' with j < j' (respectively j < j') . 
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Proposition 4 (Intersection Points) Let u, v be as above. Then we have 
the following: 

1. [Identity] If x,y E B{u, v) are such that = then x ~ y; 

2. [Precedence] Ifx,y€ B{u,v) are such that < then x < y; 

3. [Separation] If x,y £ B(u,v) are such that ^x < $y — 4t and there is 
no z £ B{u, v) with ^x < < ^y, then x < y. 

Proof: For Part 1, note that ^x = implies 

^x-% <2r. 

(Note that the term d{u, v) appears in both ^x and and therefore does not 
contribute to the error. The same argument appHes to the error calculations 
below.) Therefore, x and y are necessarily placed in the same Cj, that is, 
X ^ y. Sec Figure 5. 

For Part 2, suppose by contradiction that x > y. Then we have neces- 
sarily 

^x>% + 2r, 

which implies 



<^x-'iT + 2T <^x, 



a contradiction. 
For Part 3, let 



Xq = {w e B{u,v) s.t. < ^x}, 
Yo = {we B{u,v) s.t. > $2/}, 
xo = arg max{$t(, : w E Xq}, 
(breaking ties arbitrarily) and similarly 

yo = argmin{$^ : w£Yo}. 

Note that by assumption the pair Xq,Yo forms a partition of B{u,v). By 
assumption, 

^xo < < ^2/ - 4t < $2,0 - 4r, 
which implies for all x' G Xq and y' G io 

$y > $j/„ > + 4r - 2r > + 2r > ^x' + 2r. 

Therefore, we have x < y. ■ 
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Proposition 5 (Mini Reconstruction) Let u,v be as above. Assume 
that 2m + 3r < M. Then we have the following: 

1. [Reconstructed Edges Are Correct] For each j = 1,. . . ,r{u,v), there 
is a unique edge e in E^^^ such that 

bj,(i) (e) n B{u, v) = ipj {u,v), 

where the intersection on the left is applied separately to each set in 

the partition; 

2. [Long Edges Are Present] Let e G with e G Pj^co (u, v) and Ae ^ > 
4r. Then there is a unique j such that 

bj^(i) (e) n B{u, v) = tpj{u, v). 

Proof: Part 1 follows from Proposition 3 and Proposition 4 Part 2. Indeed, 
by Proposition 4 Part 2, il^j{u,v) is a correct bipartition of T^*) restricted 
to B{u, v). It corresponds to a unique edge of the latter tree because it is a 
full bipartition of B(u,v). By Proposition 3, every edge of T'^*-' is witnessed 
in 13{u,v), so ipj{u,v) must also correspond to a unique edge in T^^\ 

Similarly, Part 2 follows from Proposition 3 and Proposition 4 Parts 2 
and 3. ■ 

3.3 Extending bipartitions: Reconstructing the components 

Let u,v E Vm with {u,v) G and let tpj{u,v) be one of the bipartitions 
returned by Mini Contractor when given (Jim'jUjV^ as input. Let e — 
{x, y) G £^*^*) be the edge of T^*) corresponding to il^jiu., v) (as guaranteed by 
Proposition 5) and denote its bipartition by 

6^w(e) = {6W,6W}, 

where 6^") and 6^") are respectively the sides containing u and v. 

Proposition 6 (Leaves Outside Ball) Assume that 2m + 3r < M. Let 

w G Vm — B{u,v). Assume that w G b^'"^. Then, for all leaves w' in 6^"^ we 
have 

d{w, w') > m. 
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Proof: Assume by contradiction that there is w' G 6^"^ such that d(w, w') < 
m. The path between w and w' must go through e since w and w' are on 
different sides of the partition. Therefore, for one of the endpoints of e, say x, 
we have d{w, x) < m + T. Also, since d{u, x) < d{u,v) < d{u, f) + r < m + r , 
we have 

d{w, u) < d{w, x) + d{x, u) <2m + 2T < M. 

We finally get 

d{w, u) < d{w, u) + T < 2m + 3t < M, 

and similarly for d{w,v), a contradiction since we assumed w ^ B{u,v). ■ 

Proposition 7 (Correct Extension) The bipa'rtition'ijjj{u,v) returned by 
Extender is correct, that is, '4>j {u, v) = brp(i) (e) . 

Proof: Let K, tpj^^u, v), tljj"\u, v) be as in Figure 7. Since hm is connected 
and we only remove edges between ip^j^\u,v) and ipj"\u,v) to form K, it 
follows from Proposition 6 that all vertices in Vm — {'^j^\u,v) U tpj^\u,v)) 
are connected in K to either ■ipj^\u,v) or tl;j"\u,v). ■ 

We finally get the following. 

Proposition 8 (Correctness of Main Loop) Let {tW}^^^ be the trees 
obtained at the end of the Main Loop of our algorithm. Then, for all i = 
1, . . . ,q, T^*^ is a refinement of F4r^+oo{T^^^)- 

Proof: By Propositions 5 and 7, all reconstructed edges are correct and 
they include at least those edges longer than 4r. ■ 

3.4 Path-disjointness: Length and depth of shared edges 

Let r(*i\ T^"^'^^ be the tree T restricted to components hm \ respectively. 
Note that each edge in T^'^) is actually a path in T. 

Proposition 9 (Path-Disjointness) For all ui,vi G L^^'^^ and U2,V2 G 
such that 

Pt{ui,vi) n Pt{u2, V2) 7^ 0, 

it holds that 
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1. [Depth of Shared Vertices] We have 

mm{Av(-2;) : e Pr('"i, vi) n Pt(u2, ^2)} > ^(m — 3r). 

2. [Length of Shared Edges] //, further, Pt('Ui, ui) n PT(tf2, '^'2) / then 

max{Ae : e e Pt(«i, H Pr(«2, ■"2)} < 2r. 

Proof: Let z G ^T{u\-,vi)r\Y'T{u2,V2)- For j = 1,2, by Proposition 2, there 
are leaves Xj,yj in L^*^) such that z G Fj,(i-){xj,yj) and d{xj,yj) <m + T. 

For Part 1, assume without loss of generality that d{x2,z) < ^(m + r). 
Then, for all u; G L(*i), 

d{w,z) > d{w , X2) — d{z , X2) 

> m — r— -(m + rj 

> ^("^-3t). 

A similar argument applies to ttj G L^*^) ^nd w £ L — (L^^^^ U L^^"^^). 

For Part 2, let e = (x,?/) G Pr(ni,ui) n Pt{u2,V2)- Assume without 
loss of generality that the path from x to y partitions {xi, yi, X2, 2/2} as 
{{xi,X2}, {2/1,^/2}} in T, where xi,X2, 2/1,2/2 were defined above. We have 

2d{x,y) = d(xi,2/i) + tZ(x2,2/2) - c?(xi,X2) - ^(2/1,2/2) 

< (i(xi,2/i) + d(x2,2/2) - d{xi,X2) - d(2/i, 2/2) + 4r 

< 2m - 2m + 4t 

< 4r, 

where the third line follows from the definition of the clustering graph Hm- 
■ 

3.5 Proof of Main Theorem 

Proof of Theorem 1: Part 1 follows from Proposition 9. Recall that 

m < ^[M - 3r]. 
Part 2 then follows from Proposition 8 and Proposition 1. ■ 
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Figure 8: Counter-example: Reference tree Tq. 



4 Tightness of the Result 

We showed that given a (r, M)-distortion we reconstruct a subforest of T 
with chord depth ~ which includes all edges of length at least 4r. It 
may seem that we are losing a factor 2 in the chord depth and that, in fact, 
we should be able to reconstruct edges of chord depth close to M. But this 
is not the case. We show in this section that the chord depth of ~ is 
essentially best possible (up to 0(t)). 

Consider the tree Tq depicted in Figure 8. The tree Tq has four leaves 
u,v,xi,X2 with adjacent edges of length respectively 4t, + 2r, + 
4r, and + 4t. The middle edge has length 4t and the corresponding 
bipartition is {{u,xi},{v,X2}}. Assume that we have the following (r, M)- 
distortion of the metric corresponding to Tq: 

do{u, v) = ^M + lOr, do{u, xi) = + 8t, do{u, X2) = \m + 12r, 
dQ{v,xi) = dQ{v,X2) = do{xi,X2) = +00. 
Now, note that do is also a (r, M)-distortion for the tree Ti depicted 
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Figure 9: Counter-example: Tree Ti with equivalent distortion. 



in Figure 9. The tree Ti has four leaves u,v, xi, X2 with adjacent edges of 
length respectively 4r, + 2t, ^M, and + Sr. The middle edge has 
length 4r and the corresponding bipartition is {{n, X2}, {v,xi}}. 

Hence, the two incompatible trees To and Ti cannot in general be distin- 
guished from a (r, M)-distortion. In particular, note that the middle edge 
of To has length 4r and chord depth -\- lOr, yet its bipartition cannot 
be recovered. This proves the claim. 



5 Implementation 



We briefly discuss the running time of the algorithm. 

Building the graph Hm takes time O(n^), since we have to consider all 
pairs of leaves, and we find the connected components of Hm with Breadth- 
First-Search in another 0{n'^). We argue next that, for i = we 
need 0{n^) to build T^^\ where Ui = \vm\. We show first that for all pairs 
of leaves u and v, Mini Contractor and Extender take time 0{nf). 
Indeed, Mini Contractor takes time 0{ni), since its running time is 
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linear in the size of B{u,v); and Extender takes time 0{nf), since for 
each bipartition 4^j{u, v) — there are at most 0{ni) of those — it is enough to 
perform a BFS. Given all bipartitions of the tree T^*), we use the standard 
TREE POPPING algorithm of [MeaSl, BD86] to build f since we have 
0(n?) bipartitions (not all of them distinct) this last step takes time 0(n|). 
So for each tree i we need 0(nf), and summing over i's the total running 
time becomes O(n^). 

We can improve on this running time by a more efficient implementation 
of Extender as follows. For all j = 0, . . . , r(n, v), we remove from the graph 

" (i) 

hm all leaves in U^^jQ and perform a Breadth-First-Search to discover the 
leaves Kj C Vm\B{u, v) reachable in hm from the leaves in Cj. From an easy 
modification of Propositions 5 ans 6, it follows that for every w ^ B(u, v) 
there is at most one j G {0, . . . , r{u, v)} such that w is connected to a leaf in 
Cj. Given this, we can argue that we can recover the bipartitions 'tpj{u,v), 
j = 1, . . . ,r{u,v) from the Kj's. The overall time needed by the BFS's is 
0(nf), hence can be computed in time 0{nf) and our total running 
time becomes O(n^). 

The above implementation is wasteful in running a BFS for every pair of 
leaves u and v with the possibility of creating as many as O(n^) bipartitions, 
each requiring 0(n) storage. Note that there are in fact at most n distinct 
bipartitions in T. To improve on the running time one may need to combine 
the BFS's performed in the above implementation by interleaving the Mini 
Contractor and Extender steps with the TREE POPPING algorithm. 

6 Concluding remarks 

An interesting question for future work is whether the approximate disjoint- 
ness in our results can be avoided. Since we guarantee that any shared edge 
lies deep inside the forest, it is tempting to simply remove all deep edges 
(say beyond m/4) from the output forest. Unfortunately, many of these 
edges may in fact be contracted and moreover they may be clustered in "su- 
pernodes" including both deep and not-so-deep edges. It does not seem to 
be a trivial task to break these deep supernodes apart and preserve strong 
reconstruction guarantees. 
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A Log-Det Estimator 

For completeness, we relate the definition of the distorted metric (see Defini- 
tion 1) to its biological context. In phylogenetic reconstruction, a distorted 
metric is naturally derived from samples of a Markov model on a tree — a 
common model of DNA sequence evolution used in Biology. 

Definition 7 (Markov model on a tree) A Markov model on a tree is 
the following stochastic process: 

• Let Tp = {y, E, p) he a finite tree rooted at p. Denote by the set E 
directed away from the root. 

• Let L = [n] be the leaf set ofTp. 

• Let TZ be a finite set with r elements. 

• Associate to each edge e E E a r x r stochastic matrix M(e) with 
detM(e) > 0. 

• Let TTp be a distribution on TZ with '!Tp{a) > for all a eTZ. 

The process runs as follows. Pick a state for the root according to Tip. Moving 
away from the root toward the leaves, apply the channel M(e) to each edge 
e independently. Denote the state so obtained ay = {ay)v^v In- particular, 
(T[„] is the state at the leaves. More precisely, the joint distribution of cry is 
given by 

and therefore the distribution at the leaves is 

/xl(ctl)= yi ^pK) n 

^'v-'^'l=^l e={x,y)eEi 

For W C.V, we denote by p,w the marginal of jiy o,t W. 

More generally, wc arc given k independent samples from the same 

Markov model. We think of {al)f^^ as the sequence at I G [n]. Typically in 
biological applications TZ = {A, G, C, T}. MMTs model how DNA sequences 
stochastically evolve by point mutations along an evolutionary tree — under 
the assumption that each site in the sequences evolves independently. 
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In the phylogenetic reconstruction problem, we are given sequences (ctj^j)^^]^ 
(one sequence for each extant species) and our goal is to recover the gen- 
erating tree — or more precisely its unrooted version (the root is typically 
not identifiable [Ste94]). A natural place to start is to measure a notion of 
"distance" between the leaves. That is, we seek to associate to an MMT an 
additive metric as defined in Definition 8. In general, this can be achieved 
using the so-called log-det distance. 

Definition 8 (Log-Det Distance [Ste94]. See also [BH87, LSHP94, Lak94].) 

Consider the Markov model in Definition 7. Associate to each edge e = 
{u, v) £ a weight A(e) as follows: 

• If e is a leaf edge then 



A(e) = -logdetM(e) - 2 log H /^»(^')- 



a' en 



• Otherwise 



A(e) = -logdetM(e) - ^ log J] /^"('^O + ^^^S 11 ^'^^''">■ 



The log-det distance is defined as: Vn, v E L, 




A, 



eePT{u,v) 



where 



It was shown in [Ste94] that the log-det distance is indeed an additive metric. 



When the sequence length k is finite, we can only obtain an estimate d oi d 



d{u, v) = — log det F{u,v), 



where 



1 



k 



k 



= a',ai = a"}. 



1=1 



The next lemma, a slight generalization of Proposition 2.1 in [Mos07], shows 
that such an estimator constitutes a distorted metric. 
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Lemma 1 (Log-Det Distance: Distorted Metric) Let d be the estima- 
tor defined above. Then there is a constant A > such that if one chooses 
{t,M) with 

A 



A: > 



2M+4t 



logn, 



- (1-6-^)2 

then d is a (r, M) -distortion with probability 1 — l/poly(n). 

Proof: Fix u,v £ L. Denote F = F(u,v), F = F(u,v), lo = d(u,v), and 
u = d{u,v). We assume that k is at least O(logn). Let F' be F with 
one sample arbitrarily changed. It was argued in [ESSW99b] that there are 
constants ci , C2 such that 



det F - det F' 



< 



and 



det F - E[det F] 



< 



ci 
k ' 



Assume u < M+2r (in particular if a; < M+r). By Azuma's inequality, 



'u;>lo-\-t] = P[detF-detF< -(detF)(l-e~^)] 

-M-2t/ 



< 

< exp 

where we assume k is large enough that 



detF - E[detF] < e"™"^^(l 



C2 



-M-2r 



(1 - - X 



-ir)' 



~M-2t 



C2 



ii-e-n-j>o. 



The same inequality holds for P[a) < a; — r]. 

On the other hand, assume a; > M + 2t. Then, 



P<M-{-t] < P det F - E[det F] > e 



-M-T 



— e 



-M-2t 



< exp 



-M-T 



(1 - e--) - 



C2\- 
k) 
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